A Case of Industrial vs. Open-source OCL: Not So Different After All

نویسندگان

  • Josh G. M. Mengerink
  • Jeroen Noten
  • Ramon R. H. Schiffelers
  • Mark van den Brand
  • Alexander Serebrenik
چکیده

When studying model-driven engineering (MDE) in industry, generalization of studies is often hard, as most artifacts are proprietary and confidential in nature. A possible solution would be to study open-source artifacts. However, open-source artifacts are not necessarily representative for those found in the industry. As the first step towards investigating the viability of opensource MDE artifacts as an alternative to less accessible industrial ones, we use a large open-source dataset and several industrial meta-models to show that the complexity of OCL expressions in open-source and industry is similar. I. MOTIVATION AND GOALS Model-Driven Engineering (MDE) is being used in Industry to assist engineers in specifying systems [1], [2]. By using MDE to create domain-specific languages (DSLs), engineers can specify these systems in terms relative to their domain, rather than encoding them into general-purpose languages. However, the metamodels that underpin these DSLs are often highly complex, and at some point their expressive power is not sufficient to accurately model the domain. For instance, type systems require extra expressive power [3]. To mitigate this deficiency in metamodels, more complex mechanisms such as the Object Constraint Language (OCL) [4] have been proposed. OCL allows DSL engineers to write down complex constraints on valid models, such that the domain can be modeled more accurately. OCL has been subject to many studies in a variety of contexts such as usage [5], [6], [7], verification [8], [9], and maintenance [10]. Several of these studies have already concluded that lack of data might threaten generalizability of their conclusions [5], [10]. In particular, this lack of data holds for studies on industrial data, as most industrial applications of MDE (and thus OCL) are proprietary (and thus confidential) in nature. We envision that open-source can be used as means to demonstrate and evaluate practical limitations of techniques proposed to analyze [11], [12] and visualize OCL [13]. For open-source it is easier to create large and publicly available datasets [5], [7] to ensure generalization and replication of results. In order to be able to evaluate techniques on open-source artifacts and derive conclusions valid for the industry, there should be sufficient evidence that open-source artifacts can be seen as representative of industrial practice. While similar observations have been made for non-MDE software [14], it is not a priori clear that this also the case for OCL. Hence, a plethora of measurements should be performed to test for differences between the open-source MDE artifacts and their industrial counterparts. As a first step, in this work, we test whether complexity of open-source OCL expressions differs from complexity of the industrial ones. We have chosen to start with complexity, as it encompasses various aspects of artifacts. As such, it should serve as a good indication of similarity, or difference between open-source and industry. In our previous work [7] we have constructed a publicly available dataset of over 9000 OCL expressions. We compare this dataset with the data obtained from the industry, and ask the following research question: Do the complexities of open-source and industrial OCL code differ? II. DATA DESCRIPTION AND ANALYSIS We analyze a dataset of OCL expressions1 previously mined from open source GitHub projects [7], and a dataset of OCL expressions from industrial projects by ALTRAN. The GitHub dataset includes .ocl and .ecore files (.ecore files are included as they may have embedded OCL expressions). It contains over 9000 OCL expressions obtained from those files, i.e., more than ten times more than datasets used in previous studies [5] and includes the dataset of Cabot2. 1https://github.com/tue-mdse/ocl-dataset 2https://github.com/jcabot/ocl-repository The ALTRAN dataset is derived from seven metamodels obtained from ALTRAN, a large company offering thirdparty MDE services. Using EMMA, our EMF (Meta)Model Analysis tool [15], we extracted 73 OCL expressions. To compare the datasets we focus on complexity. Complexity is one of the most studied aspects of software quality both in MDEand traditional software [16], [17], [18]. For OCL expressions complexity has been operationalized as “the number of distinct properties” used by an expression [5]. For instance the expression “context Person inv: self.age >= 0” has a complexity of one, as it only references the age property of Person. On the other hand, the expression “context Auto inv: self.registration >= self.constructionYear” has a complexity of two as it references both the registration, and constructionYear. In order to determine whether the complexities of opensource and industrial OCL code differ, we apply a MannWhitney-Wilcoxon test [19]. We opt for this test since it is non-parametric [20], i.e., does not make assumptions about the shape of the underlying distributions, and is robust in presence of populations of unequal sizes [19]. Moreover, it is commonly used in software engineering research [21]. As null-hypothesis (H0) we take therefore: “The distributions of complexity of the samples of industrial and open-source OCL expressions represent two populations with the same median values” , leaving the alternative hypothesis (Ha) to be: “The distributions of complexity of the samples of industrial and open-source OCL expressions represent two populations with different median values”. To reject the null hypothesis we use the traditional threshold of 0.05. III. RESULTS AND DISCUSSION We start by inspecting Figure 1. It shows a violin plot [22] of the computed complexities. The median, Q3, and maximum complexity of open-source OCL expressions from GitHub are higher (2, 3, 36, respectively) than those of the industrial expressions from the ALTRAN dataset (1, 2, 5, respectively). Statistical comparison of the distributions, however, results in the p-value of the Mann-Whitney-Wilcoxon test being 0.05591, which slightly exceeds the traditional threshold of 0.05. Hence, as far as expression complexity is concerned, the differences observed above are not enough to claim that the complexity distributions are statistically different. There is no reason to assume that the industrial OCL expressions differ from open-source OCL expressions. We can conclude, thus, that future results obtained for the open-source OCL expressions are likely to be valid for industrial OCL expressions as well. Validity of the previous conclusion might have been threatened by the limited size of the ALTRAN dataset that may not be representative of industrial practice in general. Due to the proprietary nature of industrial models, there is little we can do about this. However, as the open-source dataset is publicly available,3 we encourage the reader to replicate our study on their proprietary datasets. 3https://github.com/tue-mdse/ocl-dataset 0 5 10 15 20 25 30 35 O pe n− S ou rc e (N = 91 73 ) In du st ria l ( N = 93 ) Fig. 1: Open-source OCL expressions appear to be slightly more complex than the industrial ones. A concern often raised with data mined from GitHub is that some data may merely be examples rather than “real” artifacts (cf. [23]). We inherit this threat from the previous work of Noten et al. [7]. Of the 16502 Ecore files in this dataset, 3280 contained the word “example” in their path; for OCL files, 150 of 890. Circa 20% of the dataset files are, hence, examples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clinical Outcomes of Open versus Arthroscopic Surgery for Lateral Epicondylitis, Evidence from a Systematic Review

Background: Lateral epicondylitis (LE) also known as tennis elbow is a common disease of middle-aged population.Surgery is a treatment of choice in patients not responded to the conservative management. Open and arthroscopicrelease are the two main choices for LE surgery; however, an overall consensus is not available. This study was aimedto compare the clinical outcomes after conventional open...

متن کامل

Simulation of fire stations resources considering the downtime of machines: A case study

Considering the increasing growth of cities, population and urban fabric density, it seems necessary that emergency facilities and services such as fire stations are positioned optimally so that they can fulfill the demands well. The aim of this study is the optimization of equipment use in the fire stations, minimization the time to arrive at the incident through management of referral call to...

متن کامل

Comparison of Open Source Learning Management Softwares and Presenting a Native Evaluation Tool

Introduction: Nowadays all educational institutes are trying to use technology in their structure. This effort has been faced with different barriers, including cost, time, and support. Therefore, using open source softwares can partially help us in using technology. In this article, we review main features of several open source learning management softwares, while presenting a tool which incl...

متن کامل

Evaluation of Cancer Risk of Heavy Metals in the Air of a High Traffic Urban Region and Its Source Identification

Background: Sampling was conducted on particles smaller than ten microns (PM10) in a high-traffic urban region once a week for two years in which fifteen heavy metals were measured. Methods: positive matrix factorization (EPA-PMF5), was used for source apportionment and characterization of the collected PM10. Assessment of cancer risk resulting from metals including arsenic, cadmium, chromiu...

متن کامل

Structure of the Dresden OCL Toolkit

The Object Constraint Language (OCL) as a part of the UML standard [1] is a formal language for defining constraints on UML models. We present a software platform for OCL tool support [2]. The platform is designed for openness and modularity, and is provided as open source. The goal of this platform is, for one thing, to enable practical experiments with various variants of OCL tool support, an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017